Wide  Area  Information  Servers: 

A  Supercomputer  on  every  Desk 


Brewster  Kahle 
Thinking  Machines  Corporation 


-  Thinking  Machines  Corporation 


What  I  really  want.. 


•  My  persona!  information  to  be  accessible 

•  Published  information  should  find  me 

•  Usable  anywhere 

•  Others  can  use  what  I  have  learned  (if  I  want  them  to) 
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Electronic  Publishing 
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New  Communications  Technology  Problems 


BOOKS 

Experts  only 

Monks 

Distribution  is  hard 
and  expensive 

Vellum  is 
calfskin 

Different  interfaces. 

1000's  of  languages 
in  Europe  alone 

Material  is 
intractable 

Scrolls  and  manu- 
scripts were  about 
as  random  access 
as  musical  scores 

Business  model 
\  needed 

Centralized  printing 
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Navigation  Techniques:  Paper 


•  Alphabetical  Listings  (dictionary,  Encyclopedia) 

•  Indices  (back  of  the  book  and  Readers  Guide) 

•  Table  of  Contents  (outlining) 

•  Citation  index 

•  "Tree  of  Knowledge" 

•  Have  you  read  any  good  books  lately? 
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•  Hierarchical  File  Systems 

•  Unix  "find"  and  "grep",  Mac  "find  file" 

•  Boolean  query  systems  (...within  5  words  of...) 

•  Static  Hypertext  links  (see  also  pointers) 
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Navigation  Techniques:  WAIS 

•  English  language  questions  and 
Relevance  feedback 

*  Iterative  retrieval 

*  Question-answer  dialog 

*  Similar  to  the  Newspapers  front  page  the: 
"continued  on  page  5" 

*  Dynamic  Hypertext  Links 

•  2  level  search: 

*  Directory  of  servers  (server  like  any  other) 

*  Servers  themselves 

•  Copy  editors  help  select  documents 

*  Easy  to  "publish"  opinions  on  documents 
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<$>  CM  applications 
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Vhich  are  simlSaK.  to  In  these  sources 

t          '  o 
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Results 

^> 
o 

Step  1 :  Sources  are  dragged  with  the  mouse  into  the  Question  Window.  A 
question  can  contain  multiple  sources.  When  the  question  is  run,  it  asks  for 
information  from  each  included  source. 
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Step  2:  When  a  query  is  run,  headlines  of  documents  satisfying 
the  query  are  displayed. 
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ID!  Technology:  Computer  Firms  See  the  LUriting  <  §=_]= 


International  Business  Machines  Corp.,  Apple  Computer  Inc. 
and  other  big  computer  makers  are  staking  out  positions  in 
the  nascent  market  for  "note- pad  computers/'  small  machines 
that  let  users  enter  data  by  writing  rather  than  tapping 
keys.  The  note  pads  typicall  y  recognize  numbers  and  letters 
printed  on  a  screen  with  a  special  pen  and  convert  them  into 
conventional  electronic  characters.  The  information  is  then 
stored  for  later  transfer  to  a  personal  computer  or  a 
company's  main  computers. 

The  size  of  the  market  for  note-pad  computers  isn't  clear, 
but  infocorp,  a  Santa  Clara,  Calif.,  market- research  firm, 
estimates  the  market  will  grow  to  3.4  million  units  sold  in 
1  995  from  22,000  units  this  year.  Only  one  company,  Tandy 
Corp.'s  Grid  Systems  unit,  currently  sells  note- pad  computers 
in  the  U.S.;  its  model,  introduced  last  September,  is  priced 
at  $3,000,  But  new  ventures  are  expected  to  introduce  several 
note-pad  machines  this  year.  And  already,  big  computer  makers 
are  fighting  quietly  for  control  over  software  standards  for 
these  gadgets,  which  require  different  programs  from  those 
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International  Business  Machines  Corp.,  Apple  Computer  Inc. 
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esti  mates  the  market  will  grow  to  3.4  million  units  sold  i  n 
1  995  from  22,000  units  this  year.  Only  one  company,  Tandy 
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in  the  U.S.;  its  model,  introduced  last  September,  is  priced 
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Compaq  Computer  Directors  Approve  2-for-l  Stock  Split 
International :  Bull  Agrees  to  Pay  Zenith  $1 5  Million  to  Em; 
AT&T  Set  to  Announce  Memorex  Computer  Accord 
Technology  Brief  —  international  Business  Machines :  Prk 
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Step  4:  To  refine  the  search,  any  one  or  more  of  the  result 
documents  can  moved  to  the  "Which  are  similar  to:"  box. 
When  the  search  is  run  again,  the  results  will  be  updated 
to  include  documents  which  are  "similar"  to  the  ones  selected. 
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Step  4:  To  refine  the  search,  any  one  or  more  of  the  result 
documents  can  moved  to  the  "Which  are  similar  to:"  box. 
When  the  search  is  run  again,  the  results  will  be  updated 
to  include  documents  which  are  "similar"  to  the  ones  selected. 
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Figure  1 :  The  Source  description  contains  all  the  necessary 
information  for  contacting  an  information  server. 
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WAIS  Clients 


•  Busy  24  hours  a  day  finding  information 

•  Ponder  all  indications  of  the  preferences  of  its  user 

•  Gossip  with  other  clients  about  their  discoveries 

•  Scours  the  world  (within  a  budget)  to  find  new  sources 
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WAIS  Protocol 

•  Based  on  Z39.50,  bypass  proprietary  period 
•Flexible 

•  Non  Threatening  for  corporations 

•  Search:  (words,  docjds,  databases)  ->  server 
returns  list  of:  (headline,  score,  doc_id,  types)'s 

•  Retrieval:  (doc  jd,  type,  start,  end) ->  server 
returns:  bunch  of  bytes 

•  Docjd:  An  ISBN  for  the  Electronic  Age 

((orig_server,  orig_database,  orig_locai_id) 
(dist_server,  dist_database,  distjocaljd) 

•  Server  Description: 

(:ip-address,  :database-name,  :cost,  description) 
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Connection  Machine  Server 


•  1  -25GBytes  (and  getting  bigger) 

•  Supports  thousands  of  users 

•  Automatic  Indexing 

•  Uses  words  and  phrases  in  question  to  find 
appropriate  documents 

•  First  turn-key  massively  parallel  application 
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TMC  Internet  Release 

•  CM  product  for  TCP/IP  (complete  server) 

•  Example  User  interfaces  for  free  (no  support) 
Macintosh,  Gnu  Emacs,  Xwindows 

•  Example  unix  server  software  to  create  servers 

•  Directory  of  Servers  on  the  internet  at  least  through  '91 

•  25  Servers  now:  Weather  Maps,  patents,  Government 
programs,  Risks-digest,  usenet  recipies,  Lewis  Carrol,... 

•  Anonymous  FTP  Think.com :/public/wais/* 
Mailing  list:  wais-discussion-request@think.com 
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Number  of  Clients 
Number  "of  Different-hosts 
Number  of  Searches 


Usage  in  1  day 

600  searches  max  on  Quake 
1 40  searches  ave  on  CM 
1 8  searches  ave  on  Poetry 
59  different  max  hosts 

Total  usage  of  Quake 
in  2  months 


Different  hosts:  508 
Number  of  Clients:  6729 
Number  of  Searches:  1 2652 
Number  of  Retrievals:  33897 
Total  Transactions:  46549 


Days  since  April  16, 1991 


0.00 


20.00 


40.00 


60.00 


Countries  Using  WAIS: 

Austria,  Canada,  Denmark,  Finland,  France,  Germany,  Holland,  Italy,  Mexico, 

Norway,  Sweden,  Switzerland,  USA 
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WA1S 


WAIS  Daily  Usages  on  Quake.Think.Com 


Uses 
460.00 
440.00 
420.00 
400.00 
380.00 
360.00 
340.00 
320.00 
300.00 
280.00 
260.00 
240.00 
220.00 
200.00 
180.00 
160.00 
140.00 
120.00 
100.00 
80.00 
60.00 
40.00 
20.00 
0.00 
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Number  of  Clients 
Number  of  Different-hosts 
Number  of  Searches 


Usage  in  1  day 

600  searches  max  on  Quake 
1 40  searches  ave  on  CM 
1 8  searches  ave  on  Poetry 
59  different  max  hosts 

Total  usage  of  Quake 
in  2  months 

Different  hosts:  508 
Number  of  Clients:  6729 
Number  of  Searches:  1 2652 
Number  of  Retrievals:  33897 
Total  Transactions:  46549 


Days  since  April  16, 1991 


0.00 


20.00 


40.00 


60.00 


Countries  Using  WAIS: 

Austria,  Canada,  Denmark,  Finland,  France,  Germany,  Holland,  Italy,  Mexico, 
Norway,  Sweden,  Switzerland,  USA 
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WAIS  Servers 


Top  level  server  Of  servers  (maintained  by  Thinking  Machines): 
directory-of-servers.src 

Connection  Machine  documentation  (servers  on  Connection  Machine): 
CM-fortran-manual.src  CM-paris-manual.src  CM-star-Iisp-docs.src 
CM-tech-summary.src  CMFS-documentation.src  CM-apphcations.src 

MIT  algorithms  book  adendum  (servers  at  MIT): 
MiT-algorithms-bug.src  MIT-algorithms-exercise.src 
MIT-algorithms-suggest.src 

Internet  directories  etc  (servers  at  NSF  and  Thinking  Machines) 
internet-documents.src  internet-drafts.src  internet-resource-guide.src 
internet-rfcs.src 

PD  programs  for  mainframes  (server  in  georgia) 

cosmic-abstracts.src  cosmic-programs.src  US-Gov-Programs.src 


Picture  servers: 

sample-pictures.src  weather.src 

Mail  archive  servers  (various  places): 
jik-usenet.src    sun-spots. src    risks-digest. src 
homebrew.src  info-mac.src 


Server  in  Olso  Norway: 

UiO_Publications.src  -Research  interests  of  professors 

Library  catalogs  (various  places): 
tmc-Iibrary.src     online-libraries. src 

Servers  on  WAIS: 

wais-discussion-archives.src  wais-docs.src 


Misc. 

Molecular-biology. src 
NIH-Guide.src 
bible. src 

usenet-cookbook.src 
jargon. src 
worid-factbook.src 
poetry. src 
patent-sampler. src 
rkba.src 

sample-books. src 
wall-street-journal-sampie.src 


;;genetics  abstracts 
;;guide  to  RFP?s 
;;King  James  Bible 
;;Cookbook 

;;Hacker's  Dictionary 
;;CIA  descriptions  of  countries 
;;Shakespeare,  Yeats,  Sawyer,  etc 
;;20Mbytes  of  patents  (full  text) 
;;Right  to  keep  and  bear  arms  documents 
;;A  few  books  such  as  Lewis  Carroll's  etc 
;;Couple  of  months  from  1989  WSJ 
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WA1S 


WAIS  Servers 


Top  level  server  of  servers  (maintained  by  Thinking  Machines): 
directory-of-servers.src 

Connection  Machine  documentation  (servers  on  Connection  Machine): 
CM-fortran-manual.src  CM-paris-manual.src  CM-star-lisp-docs.src 
CM-tech-summary.src  CMFS-documentation.src  CM-apphcations.src 

MIT  algorithms  book  adendum  (servers  at  mit): 
MIT-algorithms-bug.src  MIT-algorithms-exercise. src 
MIT-algorithms-suggest. src 

Internet  directories  etc  (servers  at  NSF  and  Thinking  Machines) 
internet-documents.src  internet-drafts.src  internet-resource-guide.src 
internet-rfcs.src 

PD  programs  for  mainframes  (server  in  georgia) 

cosmic-abstracts.src  cosmic-programs.src  US-Gov-Programs.src 

Picture  servers: 

sample-pictures.src  weather.src 

Mail  archive  servers  (various  places): 
jik-usenet.src    sun-spots.src  risks-digest.src 
homebrew.src  info-mac.src 


Server  in  OIso  Norway: 

UiCLPublications.src 

Library  catalogs  (various  places): 
tmc-library.src     online-libraries. src 

Servers  on  WAIS: 

wais-discussion-archives.src  wais-docs.src 

Misc. 

Molecular-biology. src 
NiH-Guide.src 
bible. src 

usenet-cookbook.src 
jargon. src 
world-factbook.src 
poetry. src 
patent-sampler. src 
rkba.src 

sample-books. src 
wall-street-journal-sample. src 


;Research  interests  of  professors 


-genetics  abstracts 
;;guide  to  RFP's 
;;King  James  Bible 
;;Cookbook 

;;Hacker's  Dictionary 
;;CIA  descriptions  of  countries 
;;Shakespeare,  Yeats,  Sawyer,  etc 
;;20Mbytes  of  patents  (full  text) 
;;Right  to  keep  and  bear  arms  documents 
;;A  few  books  such  as  Lewis  Carroll's  etc 
■•Couple  of  months  from  1989  WSJ 


Thinking  Machines  Corporation 


Conclusion 

•  Electronic  Publishing  can  fill  niches  now 

•  Companies  are  positioning  themselves  now 
(workstations,  server,  and  info  providers) 

•  Thinking  Machines  is  the 
"Engine  of  the  Information  Industry" 


Conclusion 


•  Electronic  Publishing  can  fill  niches  now 

•  Companies  are  positioning  themselves  now 
(workstations,  server,  and  info  providers) 

•  Thinking  Machines  is  the 
"Engine  of  the  Information  Industry" 
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WAIS 


Wide  Area  Information  Servers: 
A  Supercomputer  on  every  Desk 


Brewster  Kahle 
Thinking  Machines  Corporation 
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WAIS 


What  I  really  want... 


•  My  personal  information  to  be  accessible 

•  Published  information  should  find  me 

•  Usable  anywhere 

•  Others  can  use  what  I  have  learned  (if  I  want  them  to) 
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What  is  it? 

Electronic  Publishing 

(Or  publishing  over  wires) 
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WAfS 


New  Communications  Technology  Problems 


BOOKS 

To  I  o  n  ran  h 
i  cm  try  i  ctpii> 

Telephone 

ClCLrirUIHu 

Publishing 

Experts  only 

Monks 

Operators 

Professional 
searchers 

Distribution  is  hard 
and  expensive 

Vellum  is 
calfskin 

Telephones  on 
barb  wire 

$1 /minute  over 
obscure  modems 

Different  interfaces 

1000's  of  languages 
in  Europe  alone 

Switching  was 
manual 

//query  (W5) 
inform? 

Material  is 
intractable 

Scrolls  and  manu- 
scripts were  about 
as  random  access 
as  musical  scores 

No  white  pages 

600  databases 
on  Dialog 
~1  Terabyte 
140Gbyte  at  DJ 
80GB  card  catalog 
atRLG 

Business  model 
^needed 

Centralized  printing 

Pay  per 
minute 

Not  understood 
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Navigation  Techniques:  Paper 


•  Alphabetical  Listings  (dictionary,  Encyclopedia) 

•  Indices  (back  of  the  book  and  Readers  Guide) 

•  Table  of  Contents  (outlining) 

•  Citation  index 

•  "Tree  of  Knowledge" 

•  Have  you  read  any  good  books  lately? 
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Navigation  Techniques:  WAIS 


•  English  language  questions  and  Relevance  feedback 

•  Question-answer  dialog 

•  Similar  to  Newspapers:  "More  on  page  5" 

•  Dynamic  Hypertext  Links 

•  2  level  search: 

•  Directory  of  servers  (server  like  any  other) 

•  Servers  themselves 

Thinkina  Machines  Corporation 


WAIS 


Wide  Area  Information  Server 

Architecture 


DowJones 


Directory  of 
Servers 


Gateways 
N^/Hto  other  nets 


TV  Guide 
etc. 


X.25,  TCP/IP,  Modem 
Open  Interconnection 
Public  Protocol 


/ 

Image 
Servers 

Private 
Servers 

Users  Needs: 
Selecting  Servers 
Answering  Questions 
Organizing  Responses 


Architecture  Issues: 
Scalability 
Security 

Business  model  for  servers 
Reliable  Access 
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WA1S 


Peat  Marwick  System  Structure 


WAIS 


Operations: 

Archiving 

Queries 

Retrieval 
IR  Type: 

Broadcast 

Query  by  Example 
Databases: 

Wall  St  Journal 

Barron's 

400  Business  Mags 


LAN 


^     ►  Server 


x.25 
Z39.50 
9600Baud 


Connection 
Machine 


o 


Workstations 


0 


CM:  Operations:  Queries 
IR  Type: 

enhanced  relevance  feedback 
DBs:  DowVision  and 
memo's,  mail, 
word  processor  files 


Mac: 

Operations: 

Human  Int 

Retrieval 

Queries 

'Caching"  Docs 

User  Profiles 
IR  Type: 

Query  by  example 
DBs: 

Personal  Text 

Cached  data 
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WAIS 


Connection  Machine 
CM2a 


lardware  Components 


Front-End 


GatorBox 
Gateway  from 
AppleTalk  to 

Ethernet 


Macintosh  running 
WAiStation  via  MacTCP 


□ 


AppleTalk 
Zone 


Workstation  running 
WAIS  via  X-Windows 
orGMACS 
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WAIS 


WAIS  Clients 


•  Busy  24  hours  a  day  finding  information 

•  Ponder  all  indications  of  the  preferences  of  its  user 

•  Gossip  with  other  clients  about  their  discoveries 

•  Scours  the  world  (within  a  budget)  to  find  new  sources 

•  Current  implementations  on  PC,  Macintosh, 
X  Windows,  NeXT,  dumb  terminal  (dial-up) 
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WAIS 


WAIS  Protocol 


•  Based  on  N1SO  Z39.50  international  standard 

•  Flexible  —  separates  clients  from  servers 

•  Search:  (words,  docjds,  databases)  returns  list  of: 
(headline,  score,  docjd,  types) 

•  Retrieval:  (doc_id,  type,  start,  end)  returns: 
data  of  specified  type 

•  Docjd:  An  ISBN  for  the  Electronic  Age 

•  Server  Description  Structure  for  the  Directory  of  Servers 


V 
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WAIS 


How  Standard  Protocol 
can  Provide  Security 


•  Users  do  not  login  to  server,  but  search  only 
through  application  layer  protocol  (Z39.50) 

•  Server  controls  access  to  data 

•  Network  layers  below  application,  or 
application  layer  handles  authentication, 
encryption,  billing 
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The  WA1S  Protocol  is  WAIS 


•  Supports  any  search  syntax 

•  Supports  sophisticated  clients  —  puts  intelligence 
in  the  user's  hands 

•  Clients  can  run  on  any  platform 

•  Multiple  servers  in  a  single  search 

•  Retrieve  any  kind  of  data:  text,  graphics,  video,... 
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Connection  Machine  Server 


•  1-100GBytes  (and  getting  bigger) 

•  Supports  thousands  of  users 

•  Automatic  Indexing 

•  Uses  words  and  phrases  in  question  to  find 
appropriate  documents  with  relevance  feedback, 
weighted  term 

•  Supports  Boolean  Queries 

•  Cost  effective  hardware  alternative  to  mainframes 
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Data  Parallelism: 

Searching  all  the  documents  at  once 

Pharmaceutical  +12 

FDA  +9 
Medical  +6 


Stadium  / 
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WA1S  

Boolean  Search 


Retrieve  documents 
containing  specific 
combinations  of 
words 


Conceptual  Search 


Explore  a  set  of 
documents 
containing  related 
concepts 
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Boolean  Query 


Hard  to  Use: 
Complex  Syntax 


(Japanese  OR  Japan)  AND 

(building  OR  buildings  OR  (Real  AND  Estate)  AND 

(Manhattan  OR  (New  AND  York) 


Poor  Results: 

The  wrong  information 

No  ranking  of  results 


Have  you  been  paying  attention?... 
Freer  Finance:  U.S.  Regulators  Move... 
REAL  ESTATE:  California  Initiatives... 
First  Boston  Said  To  Agree  on  Sale  Of... 
Exxon,  Rockefeller  Group  to  Sell  Site... 
What's  News-Business  and  Finance 
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WAIS 


Conceptual  Search:  Phase  1 


Easy  to  Use: 
No  Syntax 


Options: 

What  do  you 
want  to  follow 
up? 


Japanese  buying  real  estate  in  mid-town  manhattan 


1 .  Time  Acts  to  Cut  Magazine  Costs... 

2.  First  Boston  Said  To  Agree  on  Sale... 

3.  Have  You  Been  Paying  Attention? 

4.  Exxon,  Rockefeller  Group  to  Sell  Site... 

5.  Hard  Sell:  Real  Estate  Developers... 

6.  What's  News-Business  and  Finance... 

7.  Integrated  Resources  Buys  Loft  Building. 


v 
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Conceptual  Search:  Phase  2 


Relevance 
Feedback: 

I  like  these; 
show  me  more 

Improved  results: 

Articles  on  related 
topics  are  found 

Results  are  ranked 


First  Boston  Said  To  Agree  on  Sale... 
Exxon,  Rockefeller  Group  to  Sell  Site... 


1.  Bids  for  Exxon  Building  in  New  York... 

2.  Time  Acts  to  Cut  Magazine  Costs... 

3.  Hard  Sell:  Real  Estate  Developers... 

4.  Time  Inc.  Sells  Its  45%  Interest... 

5.  Citicorp  Unit  Moves  to  Foreclose  on... 

6.  Litigious  Landlords:  Legal  Maneuvers 
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WAIS 


Query  Broadcast  To  Database 
on  Connection  Machine  System 

Document  Units  Scores 


Tripoli 


Libyan 

PLO 

bomb 


75 


42 


80 


52 


17 
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Results  Improve  with  Query  Size 


Precision  x 
recall 

@  25%  recall 


Average 
performance 
over  13 
reference  sets 


.6 


1 


t 


Typical 
Relevance 
Feedback 
Query 


t 


Typical 

Boolean 

Query 


"    10  20  30  40  50  60  70  80  90  10(T 

number  of  query 
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WAIS  

Document  Retrieval  Performance 


Current  algorithm  limits: 

~2  GBwith  512MB  CM-2 

-8  GB  with  2  GB  CM-2 
-25  GB  with  8  GB  CM-2 

High  recall        1         Stanfill  and  Kahle 
High  precision    J  see  Communications  of  the  ACM 

December  1986 

«  1  sec.  response 

Much  larger  DBs  searchable  with  CM-5 

and  inverted  index  algorithms:  100s  to  1000s  of  Gigabytes 
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AIS 


jWAIStation! 


WAIStation: 
active  database  sources, 

SjtfuuBooc    *Hfc  off  BBiuuuucBt  SB     w  Ha       +  JBcw  BH     BSE  floouuuuflHi  ^ftwopftfa.  ccx  £a  »      Bb  S9  SB  ^SBfebh 

iySr  Mnn_ »  JS  «L  4i     jgj  U  —  s  1  £  J  |  &  'M*ifc 

■jx-jaoal      wgflf  w&uJW  TBa_^aBB  v^^hSSSx    Sba^Sk  tR    ^gy  MH  Soft  SB  Twfc  iiMr  SB  SB  ^ft»_J9 

^3BR"JUIi    woe  ^8S^   ^Kj8l^8B  ^gsaaSy  fflgysK  ^SgB*^  ^9885^  ^Bl  M  ^sjbjbf  ™  ■*  ^B0F 


=l~l^=  Sources  ^^^m 

Questions 

<8>  CM  applications 

o 

? 

■ 

CM  Apps  Question 

<8>  Encyclopedia 

? 

Library  question 

<$>  King  James  Bible 

■ 

Encyclopedia  Q 

<$>  Macintosh  Hard  Disk 

? 

■ 

Patent  0 

<§>  TMC  Business  email 

? 

TMC  Bus.  Email  0 

<t>  TMC  Library  Catalog 

? 

■ 

TMC  Fun  0 

<§>  Vjf/St.  Jwruf 

? 

■ 

Montvale  0 

<$>  World  Factbook 

■ 

World  Factbook  0 

■ 

poetry  q 

o 

? 

Bible  0 
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Select  Data  Source 


=FI^=  Sources 


<8>  CM  applications 
<$>  Encyclopedia 
<$>  King  James  Bible 
<t>  Macintosh  Hard  Disk 
<•>  TMC  Business  email 

<S>  World  Factbook 


Question-1 


Look  for  documents  about 


Vhich  are  sim^&.to  In  these  sources 


Results 


Run 


HI 
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WAIS 


Run  Initial  Query 


Question-1 


Look  for  documents  about 


recent  developments  in  personal 
computers;  


Vhich  are  similar  to  In  these  sources 


Run 


Results 


Compaq  Computer  Directors  Approve  2-for-l  Stock  Split  ^ 
International :  Bull  Agrees  to  Pay  Zenith  $1 5  Million  to  En| 
AT&T  Set  to  Announce  Memorex  Computer  Accord 
Technology  Brief  —  International  Business  Machines :  Pric 
Business  Brief  —  Data  General  Corp. :  Four  Models  Are  Un 


Teohnoloqu  :  Computer  Firms:  See  the  'v/ritinq  on  the  Soreej 


Retailing :  Businessland  Enters  Japan,  Aided  by  4  Big  Loca 
Pnrnartinn^  #  Amnlifiratinng  


O 
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Click  a  Headline  to 
Display  a  Document 


in 


Question-l 


Look  for  documents  about 


recent  developments  in  personal 
computers)  


ED 


Which  are  similar  to  In  those  sources 


Results 


Hi  ***  Compaq  Computer  Directors  Approve  2-for-l  Stock  Split  <£r 
D  *#*  international :  Bull  Agrees  to  Pay  Zenith  $15  Million  to  En< 
^  ***  AT&T  Set  to  Announce  Memorex  Computer  Accord 
|?|  ***  Technology  Brief —  International  Business  Machines:  Pric  iil^ 
Hi  ***  Business  Brief  —  Data  General  Corp. :  Four  Models  Are  Un 


Teghnolaau :  Computer  Firms      the-  Writing  on  th: 


Ret 


iPI  Technology:  Computer  Firms  See  the  LUriting 


International  Business  Machines  Corp.,  Apple  Computer  Inc. 
and  other  big  computer  makers  are  staking  out  positions  in 
the  nascent  market  for  "note-pad  computers/' small  machines 
that  let  users  enter  data  by  writing  rather  than  tapping 
keys.  The  note  pads  typically  recognize  numbers  and  letters 
printed  on  a  screen  with  a  special  pen  and  convert  them  into 
conventional  electronic  characters.  The  information  is  then 
stored  for  later  transfer  to  a  personal  computer  or  a 
company's  main  computers. 

The  size  of  the  market  for  note- pad  computers  isn't  clear, 
but  Infocorp,  a  Santa  Clara,  Calif.,  market- research  firm, 
estimates  the  market  will  grow  to  3.4  million  units  sold  in 
1 995  from  22,000  units  this  year.  Only  one  company,  Tandy 
Corp.'s  Grid  Systems  unit,  currently  sells  note- pad  computers 
inthe  U.S.;its  model,  introduced  last  September,  is  priced 
at  $3,000.  But  new  ventures  are  expected  to  introduce  several 
note-pad  machines  this  year.  And  already,  big  computer  makers 
are  fighti  ng  quietly  for  control  over  software  standards  for 
these  gadgets,  which  require  different  programs  from  those 


Hi 
<> 
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Relevance  feedback: 
Find  me  more  like  this  one" 


Question- 


Look  for  documents  about 


recent  developments  in  personal 
computers 


Vhich  are  similar  to  In  these  sources 


|£|  Technology :  Coi-O^ 


<$>  Man  St  JbutTul 


0 


Compaq  Computer  Directors  Approve  2-for-1  Stock  Split  ^ 
International:  Bull  Agrees  to  Pay  Zenith  $15  Million  to  Enf 
AT&T  Set  to  Announce  Memorex  Computer  Accord 
Technology  Brief  —  International  Business  Machines 
Business  Brief  —  Data  General  Corp. :  Four  Models  Are  Un 


Technology  :  Computer  Firms  See  the  Writing  on  the  Sore*! 


Pric  lliiii 


Retailing :  Businessland  Enters  Japan,  Aided  by  4  Big  Loca 
Pnrrftrtinrw  &  Amnlifiratinns  
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Relevance  Feedback  of  Paragrap 


Technology:  Computer  Firms  See  the  Writing 


Computer  makers  are  scrambling  to  cash  in  on  people  who 
find  the  pen  mightier  than  the  keyboard. 
International  Business  Machines  Corp.,  Apple  Computer  Inc. 
other  big  computer  makers  are  staking  out  positions  in 
nascent  market  for  "note -pad  computers/' small  machines 
pt  let  users  enter  data  by  writing  rather  than  tapping 
*.  The  note  pads  typically  recognize  numbers  and  letters 
gted  on  a  screen  with  a  special  pen  and  convert  them  i  nto 


Questional 


Look  for  documents  about 


bu 
esti 


ecent  developments  in  personal 
omputers 


Run 


3 


fhich  are  similar  to  In  these  sources 


=  Technology :  Cor^ 


<§>  WjfrSt  Jbar-njf 


O 


Results 


Compaq  Computer  Directors  Approve  2-for-l  Stock  Spit 
International :  Bull  Agrees  to  Pay  Zenith  $1 5  Million  to  En| 
AT&T  Set  to  Announce  Memorex  Computer  Accord 
Technology  Brief  —  International  Business  Machines : 
Business  Brief  —  Data  General  Corp.:  Four  Models  Are  Up 
***  Technology  :  Computer  Firms  See  the  Writing  on  the  Scree 
Retailing:  Businessland  Enters  Japan ,  Aided  by  4  Big  Loca 


to 
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"Chaining"  of  Questions 
to  Follow  a  Tangent 


Question-1 


Look  for  documents  about 


recent  developments  in  personal 
computersl  


Run 


Vhich  are  similar  to  In  these  sources 


=  Technology :  Cot£r 


<9>  tell  St.  Jwtrul 


Results 


En<<> 


[H  ***  International:  Bull  Agrees  to  Pay  Zenith  $15  Million  to 
|||  ***  AT&T  Set  to  Announce  Memorex  Computer  Accord 
g|  ***  Technology  Brief  —  International  Business  Machines :  Pric  ij 
Hi  www  Business  Brief —  Data  General  Corp.:  Four  Models  Are  Un 
Technology  :  Computer  Firms  See  the  Writing  on  the  Scree 


***  Retailing:  Bu£in*£sland  Enters  Japan,  Aided  by  4  Biq  Lo*a 


Question-2 


Look  for  documents  about 


0 


fhich  are  similar  to  In  these  sources 

D  Retailing :  BusinlO 


o 


Results 


g  www  Retailing:  Businessland  Enters  Japan,  Aided  by  4  Big  Loca|£r 

g|  **  What's  News  —  Business  and  Finance 

[r|  ww  Technology  :  Computer  Makers  Agree  on  a  Standard  For  N< 

w  Inside  Track :  Businessland  Directors  Take  a  Loss  And  Tra 

0  w  Technology  &  Health :  Businessland  To  Report  Loss  For  3r 

g|  w  Technology  :  U.S.  Computer  Maker  Takes  on  NEC  on  Its  Ov 

g  w  Technology  :  Computer  Firms  See  the  Writing  on  the  Scree 
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WAIS 


TMC  Internet  Release 


•  CM  product  for  TCP/IP  (complete  server) 

•  Example  User  interfaces  for  free  (no  support) 
Macintosh,  Gnu  Emacs,  Xwindows 

•  Example  unix  server  software  to  create  servers 

•  Directory  of  Servers  on  the  internet  at  least  through  '91 

•  160  Servers  now:  Weather  Maps,  patents,  journal 
abstracts,  email  archives,  Usenet  recipies,... 

•  Free  Software  via  FTP  from  Think.com:/wais/* 

Mailing  list:  wais-discussion-request@think .  com 
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31 


WAIS  Uses 


•  Over  10,000  users  on  the  Internet 

•  Users  in  24  Countries:  Mexico,  Singapore, 
Finland,  Australia,  etc 

•  160  Databases  served  from  9  Countries: 
Norway,  Canada,  UK,  etc. 

Average  3  new  databases  registered  per  week. 
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WAIS  Uses: 
Campus  Wide  Info  Servers 


•  Class  catalog  and  schedule 

•  Campus  events:  movies,  sports 

•  Job  listings 

•  Library  catalog 

•  Phone  book 

•  Professor  research  interests 

•  Past  theses 


sol . acs . unt . edu] 
xant os . uio . no ] 
next 2 . oit . unc . edu] 


UNTComput  e  rDoc 
UiO_Publications 
ibm.pc  .FAQ 
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WAIS  Uses:  Libraries 


•  Easy  to  use  card  catalog 

•  Remote  use  from  home  or  office 

•  Pictures,  full  text,  scanned  documents 

[pegun . law . Columbia . e ]  columbia-law-library-catalog 
[pegun . law . Columbia . e]  columbia-spanish-law-catalog 
[  quake . think . com]  tmc-library 


Thinking  Machines  Corporation 


WAIS 


WAIS  Uses:  Biology 


•  Journal  Abstracts 

•  Sequence  archives 

•  Images 

Currently  over  20  Biology  databases  in 
Finland,  Netherlands,  and  US 


cmns . think . com]  Molecular-biology 

bio .vu.nl]  biology-compounds 

genbank.bio.net]  biology- journal-contents 

wais . funet . f i]  bionic-ai-researchers 

wais . funet . f i]  bionic-directory-of -servers 

wais . funet . f i ]  bionic-enzyme 


V 
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WASS 


WAIS  Uses:  Chemistry 
CORE  Project 


•  All  published  chemistry  (8  years  all  ACS) 

•  Scanned  pictures,  ascii  text 

•  Optical  jukebox  mass  storage 

•  Connection  Machine  /  Newton  search  engines 

Project  of  :Bellcore,  ACS,  Chem  Abstracts, 
OCLC,  Cornell,  and  Thinking  Machines 

[     cujo.curtin.edu.au]     chem- eng- current -contents 
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WAIS  Uses: 
Business  Executives 


•  Dow  Jones  information 

•  Corporate  information 

•  Personal  information 

Project:  KPMG,  Apple,  Thinking  Machines, 
Dow  Jones 

[  cmns.think.com]     wall-street- journal -sampl 

[  think.com]     Business -email 
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WAIS  

WAIS  Uses: 
Medical  Researchers/Doctors 


•  Medical  papers 

•  Storing  and  matching  patient  records 

•  Remote  connections  to  specialized  databases 

wais . funet . f i]  bionic-databases-limb 
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WAIS 


WA1S  Uses: 
Community  Information 

•  Dial-up  users:  no  network  required 

•  Directories  of  services  or  facilities 

•  Education  and  entertainment 


[  quake . think . com] 

[  sol.acs.unt.edu] 
[  quake . think . com] 

[  lambada . oit . unc . edu ] 


internet -re source -guide 

online-libraries 

weather 

nsf -bullet ins 
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Conclusion 


•  Electronic  Publishing  can  fill  niches  now 

•  Companies  are  positioning  themselves  now 
(workstations,  server,  and  info  providers) 

•  Thinking  Machines  is  the 
"Engine  of  the  Information  Industry" 
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WAIS 


Wide  Area  Information  Servers: 
A  Supercomputer  on  every  Desk 


Brewster  Kahle 
Thinking  Machines  Corporation 
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What  I  really  want.. 


•  My  personal  information  to  be  accessible 

•  Published  information  should  find  me 

•  Usable  anywhere 

•  Others  can  use  what  I  have  learned  (if  I  want  them  to) 


Thinking  Machines  Corporation 


What  I  really  want 


•  My  personal  information  to  be  accessible 

•  Published  information  should  find  me 

•  Usable  anywhere 

•  Others  can  use  what  I  have  learned  (if  I  want  them  to) 
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What  is  it? 

Electronic  Publishing 

(Or  publishing  over  wires) 
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What  is  it? 

Electronic  Publishing 

(Or  publishing  over  wires) 
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New  Communications  Technology  Problems 


Experts  only 

Distribution  is  hard 
and  expensive 

Different  interfaces 


Material  is 
intractable 


Business  model 
needed 


BOOKS 


Monks 


Vellum  is 
calfskin 


1000's  of  languages 
in  Europe  alone 


Scrolls  and  manu- 
scripts were  about 
as  random  access 
as  musical  scores 


Centralized  printing 


Telegraph> 
Telephone 


Operators 


Telephones  on 
barb  wire 


Switching  was 
manual 


No  white  pages 


Pay  per 
minute 


Electronic 
Publishing 


Professional 
searchers 


$1 /minute  over 
obscure  modems 


//query  (W5) 
inform? 


600  databases 
on  Dialog 
~7  Terabyte 
140Gbyte  at  DJ 
80GB  card  catalog 
at  RLG 


Not  understood 
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Navigation  Techniques:  Paper 


•  Alphabetical  Listings  (dictionary,  Encyclopedia) 

•  Indices  (back  of  the  book  and  Readers  Guide) 

•  Table  of  Contents  (outlining) 

•  Citation  index 

•  "Tree  of  Knowledge" 

•  Have  you  read  any  good  books  lately? 
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Navigation  Techniques:  Paper 


•  Alphabetical  Listings  (dictionary,  Encyclopedia) 

•  Indices  (back  of  the  book  and  Readers  Guide) 

•  Table  of  Contents  (outlining) 

•  Citation  index 

•  "Tree  of  Knowledge" 

•  Have  you  read  any  good  books  lately? 
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WA1S 


Navigation  Techniques:  Computers 

•  Hierarchical  File  Systems 

•  Unix  "find"  and  "grep",  Mac  "find  file" 

•  Boolean  query  systems  (...within  5  words  of...) 

•  Static  Hypertext  links  (see  also  pointers) 
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WAIS 


Navigation  Techniques:  Computers 

•  Hierarchical  File  Systems 

•  Unix  "find"  and  "grep",  Mac  "find  file" 

•  Boolean  query  systems  (...within  5  words  of...) 

•  Static  Hypertext  links  (see  also  pointers) 


V 
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Navigation  Techniques:  WAIS 

•  English  language  questions  and 
Relevance  feedback 

*  Iterative  retrieval 

*  Question-answer  dialog 

*  Similar  to  the  Newspapers  front  page  the: 
"continued  on  page  5" 

*  Dynamic  Hypertext  Links 

•  2  level  search: 

*  Directory  of  servers  (server  like  any  other) 

*  Servers  themselves 

•  Copy  editors  help  select  documents 

*  Easy  to  "publish"  opinions  on  documents 
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Navigation  Techniques:  WAIS 

•  English  language  questions  and 
Relevance  feedback 

*  Iterative  retrieval 

*  Question-answer  dialog 

*  Similar  to  the  Newspapers  front  page  the: 
"continued  on  page  5" 

*  Dynamic  Hypertext  Links 

•  2  level  search: 

*  Directory  of  servers  (server  like  any  other) 

*  Servers  themselves 

•  Copy  editors  help  select  documents 

*  Easy  to  "publish"  opinions  on  documents 
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WAIS 


Wide  Area  Information  Server 

Architecture 


DowJones 


Directory  of 
Servers 


X/itq 


Gateways 
other  nets 


TV  Guide 
etc. 


X.25,  TCP/IP,  Modem 
Open  Interconnection 
Public  Protocol 


/ 

Image 
Servers 

Private 
Servers 

Users  Needs: 
Selecting  Servers 
Answering  Questions 
Organizing  Responses 


Architecture  Issues: 
Scalability 
Security 

Business  model  for  servers 
Reliable  Access 
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Wide  Area  Information  Server 

Architecture 


DowJones 


Directory  of 
Servers 


Gateways 
other  nets 


TV  Guide 
etc. 


X.25,  TCP/IP,  Modem 
Open  Interconnection 
Public  Protocol 


/ 

Image 
Servers 

Private 
Servers 

Users  Needs: 
Selecting  Servers 
Answering  Questions 
Organizing  Responses 


Architecture  Issues: 
Scalability 
Security 

Business  model  for  servers 
Reliable  Access 
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Demonstration  System  Structure 


WAIS 


LAN 


^     »r  Server 


Operations: 

Archiving 

Queries 

Retrieval 
IR  Type: 

Broadcast 

Query  by  Example 
Databases: 

Wall  St  Journal 

Barron's 

400  Business  Mags 


Z39.50 
9600Baud 


Connection 
Machine 


o 


Workstations 


•  •  • 


0 


CM:  Operations:  Queries 

■  MB  MM 

IR  Type: 

enhanced  relevance  feedback 
DBs:  DowVision  and 
memo's,  mail, 
word  processor  files 


Hflae: 

Operations: 

Human  Int 

Retrieval 

Queries 

'Caching"  Docs 

User  Profiles 
IR  Type: 

Query  by  example 
DBs: 

Personal  Text 

Cached  data 
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WAIS 


Demonstration  System  Structure 


WAIS 


LAN 


^     ►  Server 


Operations: 

Archiving 

Queries 

Retrieval 
IR  Type: 

Broadcast 

Query  by  Example 
Databases: 

Wall  St  Journal 

Barron's 

400  Business  Mags 


x.25 
Z39.50 
9600Baud 


Connection 
Machine 


f  -  — -\ 

o 


Workstations 


0 


CM:  Operations:  Queries 
IR  Type: 

enhanced  relevance  feedback 
DBs:  DowVision  and 
memo's,  mail, 
word  processor  files 


Mac: 

Operations: 

Human  Int 

Retrieval 

Queries 

'Caching"  Docs 

User  Profiles 
IR  Type: 

Query  by  example 
DBs: 

Personal  Text 

Cached  data 
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WAIS  Clients 


•  Busy  24  hours  a  day  finding  information 

•  Ponder  all  indications  of  the  preferences  of  its  user 

•  Gossip  with  other  clients  about  their  discoveries 

•  Scours  the  world  (within  a  budget)  to  find  new  sources 
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WAIS  — —  

WAIS  Clients 
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WAIS  Protocol 

•  Based  on  Z39.50,  bypass  proprietary  period 

•  Flexible 

•  Non  Threatening  for  corporations 

•  Search:  (words,  docjds,  databases)  ->  server 

returns  list  of:  (headline,  score,  docjd,  types)'s 

•  Retrieval:  (doc_id,  type,  start,  end)  ->  server 
returns:  bunch  of  bytes 

•  Docjd:  An  ISBN  for  the  Electronic  Age 

((orig_server,  orig_database,  orig_local_id) 
(dist_server,  dist_database,  distjocalj'd) 

•  Server  Description: 

(:ip-address,  :database-name,  :cost,  description) 
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WAIS  Protocol 


•  Based  on  Z39.50,  bypass  proprietary  period 
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returns:  bunch  of  bytes 

•  Docjd:  An  ISBN  for  the  Electronic  Age 

((orig_server,  orig_database,  origjocaljd) 
(dist_server,  dist_database,  distJocal_id) 

•  Server  Description: 

(:ip-address,  :database-name,  :cost,  description) 
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•  1-25G Bytes  (and  getting  bigger) 

•  Supports  thousands  of  users 

•  Automatic  Indexing 

•  Uses  words  and  phrases  in  question  to  find 
appropriate  documents 

•  First  turn-key  massively  parallel  application 
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Connection  Machine  Server 


•  1-25GBytes  (and  getting  bigger) 

•  Supports  thousands  of  users 

•  Automatic  Indexing 

•  Uses  words  and  phrases  in  question  to  find 
appropriate  documents 

•  First  turn-key  massively  parallel  application 
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i 

I 
1 


i 


I 


® 


m 


IvV] 


MNMNfc  ##p& 

MB 

SB 
£af8 


I  fllll 


■\1 


K  HP/ 


An  example  document  is  compared  to  all  the  others,  in  parallel 


wmm 
wmwmm 


Only  the  best  matches  are  presented  to  the  user 


Thinking  J&i^^C^pSnSm 


nnection  Machine  Server 


•  1  -25GBytes  (and  getting  bigger) 

•  Supports  thousands  of  users 

•  Automatic  Indexing 

.  Uses  words  and  phrases  in  question  to  find 
appropriate  documents  : 

.  First  turn-key  massively  parallel  application 
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Results  Improve  with  Query  Size 


.6 


Precision  x 
recall 

@  25%  recall 


A 


.2 


Average 
performance 
over  13 
reference  sets 


.1 


0 


1QQ  90  80  70  60  50  40  30  20  if 

number  of  query 
terms 

 — —   ..Thinking  Machines  Corporation  _ 
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How  Fast? 


10  -term  query 

DB  Size 

Procs 

1  5  cxu 

ATT 

J  VJJD 

U  vJJ5 

loiv 

1 9  itr 

li  VJJD 

jLH-  vjjd 

041V 

VTT  VjjD 

1*0  VJJD 

oiV 

256  GB 

16K 

512  GB 

32K 

1024  GB 

64K 

2048  GB 

64K 

4096  GB 

64K 

8192  GB 

64K 

DVs 

Titrip 

X  1111  v 

oiuidge  ivieinou 

0 

/~\     M*\  mm  mm 

0.055 

Main  Memory 

0 

0.055 

Mam  Memory 

0 

f\    /"\  mm  mm 

0.055 

Main  Memory 

0 

0.055 

Main  Memory 

n 
u 

U.Ujj 

Main  Memory 

i 

1.7 

Independent  Disk 

i 

2.8 

Independent  Disk 

2 

3.6 

Striped  Disk 

4 

3.6 

Striped  Disk 

8 

3.6 

Striped  Disk 

,16 

5.1 

Striped  Disk 

32 

8.2- 

Striped  Disk 

64 

12.4 

Striped  Disk 

Estimates  based  on  synthetic  database,  benchmark  code. 


bn^555  Sources  ^^^^ 

<$>  CM  applications 
<§>  Encyclopedia 
<§>  King  James  Bible 
<S>  Macintosh  Hard  Disk 
<S>  TMC  Business  email 

<f>  World  Factbook 

& 

Look  for  documents  about 

o 

Vhich  are  sima?*?.*:.  to  In  these  sources 

t          '  o 

& 

o 

Results 

r 
o 

9} 

Step  1 :  Sources  are  dragged  with  the  mouse  into  the  Question 
Window.  A  question  can  contain  multiple  sources.  When  the 
question  is  run,  it  asks  for  information  from  each  included  source. 
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=n=  Sources  ^^^= 

<§>  CM  applications 
<$>  Encyclopedia 
<§>  King  James  Bible 
<f>  Macintosh  Hard  Disk 
<S>  TMC  Business  email 
<§>  TMC  Libram-^1 

<§>  World  Factbook 

o 

Look  for  documents  about 

o 

U  ii  "™  i 

51  v  ' 

Vhich  are  siml&ar,  to  In  these  sources 

1   vv/.:,/..-'.;:s;;.:.5';.:  ^i^V^^r^  W-iffSt  Joc/trul 

1  o 

o 
o 

Results 

o 
o 

 __  ia 

Step  1:  Sources  are  dragged  with  the  mouse  into  the  Question 
Window.  A  question  can  contain  multiple  sources.  When  the 
question  is  run,  it  asks  for  information  from  each  included  source. 
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WAIStation 

Step  2 


Questional 


IHI 


Look  for  documents  about 


recent  developments  in  personal 
computers)    


Run 


Vhich  are  similar  to  In  these  sources 


o 


o 


<$>  Waff  St.  JbutTKil 


Results 


g)         Compaq  Computer  Directors  Approve  2-for-1  Stock  Split  <> 
il  ***  International :  Bull  Agrees  to  Pay  Zenith  $15  Million  to  En<| 
gl  ***  AT&T  Set  to  Announce  Memorex  Computer  Accord  | 
[?|  ***  Technology  Brief  —  International  Business  Machines:  Pric si 
|g)  ***  Business  Brief  —  Data  General  Corp.. :  Four  Modejs_^re_Ur  r 

[g  ***  Retailing :  Businessland  Enters  Japan,  Aided  by  4  Big  Loca  ^ 


V 

Step  2:  When  a  query  is  run,  headlines  of  documents  satisfying 
the  query  are  displayed. 
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Question-1 


Look  for  documents  about 


recent  developments  in  personal 
computers]  


o 


G3 


Which  are  similar  to  In  these  sources 


o 


Results 


*w*  Compaq  Computer  Directors  Approve  2-for-1  Stock  Split 
***  international:  Bull  Agrees  to  Pay  Zenith  $15  Million  to  En<$ 
***  AT&T  Set  to  Announce  Memorex  Computer  Accord 

Technology  Brief —  International  Business  Machines:  Pric 
***  Business  Brief  —  Data  General  Corp.:  Four  Models  Are  Un 


T£*hrfol*qu  :  Computis'r  FiVms  8**  th*  Vritiriq  of*  th*  Sor*e 


★**  Retailing:  Businessland  Enters  Japan,  Aided  by  4  Big  Loca 


Step  2:  When  a  query  is  run,  headlines  of  documents  satisfying 
the  query  are  displayed. 
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WAIStation 


Look  for  documents  about 

recent  developments  in  personal 
computers] 

O 

Which  are  similar  to  In  these  sources 

o 
o 

o 
o 

Results 

***  International ;  Bull  Agrees  to  Pay  Zenith  $15  Million  to  En< 

AT&T  Set  to  Announce  Memorex  Computer  Accord 
w**  Technology  Brief  —  International  Business  Machines:  Pricii&i 
***  Business  Brief  —  Data  General  Corp.:  Four  Models  Are  Un 

Ret 


+  +  Cf\r 


ID!  Technology:  Computer  Firms  See  the  Writing  <=l 


International  Business  Machines  Corp.,  Apple  Computer  Inc. 
and  other  big  computer  makers  are  staking  out  positions  in 
the  nascent  market  for  "note- pad  computers/' small  machines 
that  let  users  enter  data  by  writing  rather  than  tapping 
keys.  The  note  pads  typically  recognize  numbers  and  letters 
printed  on  a  screen  with  a  special  pen  and  convert  them  into 
conventional  electronic  characters.  The  information  is  then 
stored  for  later  transfer  to  a  personal  computer  or  a 
company's  main  computers 

The  size  of  the  market  for  note- pad  computers  isn't  clear, 
but  Infocorp,  a  Santa  Clara,  Calif.,  market- research  firm, 
esti  mates  the  market  will  grow  to  3.4  million  units  sold  i  n 
1  995  from  22,000  units  this  year.  Only  one  company,  Tandy 
Corp.'s  Grid  Systems  unit,  currently  sells  note- pad  computers 
in  the  U.S.;  its  model,  introduced  last  September,  is  priced 
at  $3,000.  But  new  ventures  a  re  expected  to  introduce  several 
note- pad  machines  this  year.  And  already,  big  computer  makers 
are  fighting  quietly  for  control  over  software  standards  for 
these  gadgets,  which  require  different  programs  from  those 


Step  3:  With  the  mouse,  the  user  clicks  on  any  result  document 
to  retrieve  it. 
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Look  for  documents  about 

recent  developments  in  personal 
computers! 

o 

o 

/:  * 

Run 
i  if 

Vhich  are  similar  to  In  these  sources 

o 
o 

Results 

g)  ***  international:  Bull  Agrees  to  Pay  Zenith  $15  Million  to  En<^ 
[gj         AT&T  Set  to  Announce  Memorex  Computer  Accord 

Technology  Brief—  international  Business  Machines;  Pric 
£)  ***  Business  Brief  —  Data  General  Corp.  :  Four  Models  Are  Un 
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!□!  Technology:  Computer  Firms  See  the  Writing  (  =ED 


.nternational  Bu3ine33  Machines  Corp.,  Apple  Computer  Inc. 
and  other  big  computer  maker3  are  staking  out  positions  in 
the  nascent  market  for  "note- pad  computers,"  3 mall  machines 
that  let  users  enter  data  by  writing  rather  than  tapping 
key3.The  note  pad3  typically  recognize  numbers  and  letters 
printed  on  a  screen  with  a  special  pen  and  convert  them  into 
conventional  electronic  characters.  The  information  is  then 
stored  for  later  transfer  to  a  personal  computer  or  a 
company'3  main  computers. 

The  3ize  of  the  market  for  note- pad  computers  isn't  clear, 
but  I nfocorp,  a  Santa  Clara,  Calif.,  market- research  firm, 
estimates  the  market  will  grow  to  3.4  million  unit3  sold  in 
1  995  from  22,000  units  this  year.  Only  one  company,  Tandy 
Corp. 's  Grid  Systems  unit,  currently  3ell3  note- pad  computers 
in  the  U.S.;it3  model ,  introduced  la3t  September,  i3  priced 
at  $3,000.  But  new  ventures  are  expected  to  introduce  several 
note-pad  machines  this  year.  And  already,  big  computer  makers 
are  fighting  quietly  for  control  over  software  standards  for 
these  gadgets,  which  require  different  programs  fromtho3e 


■a 


Step  3:  With  the  mouse,  the  user  clicks  on  any  result  document 
to  retrieve  it. 
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Step  4 
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Look  for  documents  about 


recent  developments  in  personal 
computers  _____ 


_> 
O 


■_■  Run 


Vhich  are  similar  to  In  these  sources 
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***  Compaq  Computer  Directors  Approve  2-for-1  Stock  Split g> 
***  International:  Bull  Agrees  to  Pay  Zenith  $1 5  Million  to  En<|i;j 
***  AT&T  Set  to  Announce  Memorex  Computer  Accord 
***  Technology  Brief  -  International  Business  Machines:  Prk*^ 
Business  Brief-  Data  ^1  Corp.:  Four  Models  Are  Urj 
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***  Retailing :  Businessland  Enters  Japan,  Aided  by  4  Big  Loca^ 


Step  4:  To  refine  the  search,  any  one  or  more  of  the  result 
documents  can  moved  to  the  "Which  are  similar  o:  box. 
When  the  search  is  run  again,  the  results  will  be  updated 
to  include  documents  which  are  "similar"  to  the  ones  selected. 
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***  Compaq  Computer  Directors  Approve  2-for-1  Stock  Split  <> 
***  International :  Bull  Agrees  to  Pay  Zenith  $15  Million  to  En<jj{| 
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Step  4:  To  refine  the  search,  any  one  or  more  of  the  result 
documents  can  moved  to  the  "Which  are  similar  to:"  box. 
When  the  search  is  run  again,  the  results  will  be  updated 
to  include  documents  which  are  "similar"  to  the  ones  selected. 
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TliC  Internet  Release 

CM  product  for  TCP/IP  (complete  server) 

Example  User  interfaces  for  free  (no  support) 
Macintosh,  Gnu  Emacs,  Xwindows 

Example  unix  server  software  to  create  servers 

Directory  of  Servers  on  the  internet  at  least  through  '91 

42  Servers  now:  Weather  Maps,  patents,  Government 
programs,  Risks-digest,  Usenet  recipies,  Lewis  Carroll,. 

Anonymous  FTP  Think.com  :/public/wais/* 
Mailing  list:  wais-discussion-request@think.com 
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WAIS 


WAIS  Uses: 


Campus  Wide  Info  Servers 


•  Class  catalog  and  schedule 

•  Campus  events:  movies,  sports 
•Job  listings 

•  Library  catalog 

•  Phone  book 

•  Professor  research  interests 

•  Past  theses 

I  sol . acs .unt . edu]  UNTComputerDoc 

I  xantos.uio.no]  UiO_Publications 

I       next2.oit.unc.edu]  ibm.pc.FAQ 
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WAIS 


WAIS  Uses:  Libraries 

Easy  to  use  card  catalog 
Remote  use  from  home  or  office 
Pictures,  full  text,  scanned  documents 


[pegun .  law .  Columbia .  e]  Columbia- law- library-catalog 
[pegun . law. Columbia .  ©]  Columbia- Spanish- law-catalog 
[  quake.think.com]     tmc- library 
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WAIS  Uses:  Biology 

•  Journal  Abstracts 

•  Sequence  archives 

•  Images 

Currently  over  20  Biology  databases  in 
Finland,  Netherlands,  and  US 


cmns . think . com]  Molecular-biology 

bio.vu.nl]  biology- compounds 

genbank.bio.net]  biology- journal-contents 

wais . funet . f i]  bionic-ai-researchers 

wais . funet . f  i]  bionic-directory-of -servers 

wais.funet.fi]  bionic-enzyme 
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WAIS 

WAIS  Uses:  Chemistry 
CORE  Project 

•  All  published  chemistry  (8  years  all  ACS) 

•  Scanned  pictures,  ascii  text 

•  Optical  jukebox  mass  storage 

•  Connection  Machine  /  Newton  search  engines 

Project  of  rBellcore,  ACS,  Chem  Abstracts, 
OCLC,  Cornell,  and  Thinking  Machines 


[    cujo.curtin.edu.au]     chem- eng- current -contents  , 
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WAIS 


WAIS  Uses:  Documentation 


•  Up-to-date  documentation 

•  Online  help  system 

•  Distribution  of  bug  notices  and  fixes 

•  Mailing  list  archives 

CMNS.Think.com    CM-Fortran. src 
Quake.think.com  wais-talk. src 
PRISM  CM  programming  environment 
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Conclusion 


•  Electronic  Publishing  can  fill  niches  now 

•  Companies  are  positioning  themselves  now 
(workstations,  server,  and  info  providers) 

•  Thinking  Machines  is  the 

"Engine  of  the  Information  Industry" 
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